Traps to Developers
A summarization of some traps to developers. There traps are unintuitive things that are easily misunderstood and cause bugs.
This article spans a wide range of knowledge. If you find a mistake or have a suggestion, please leave a comment in GitHub discussion.
Summarization of traps
HTML and CSS
-
min-widthisautoby default. Inside flexbox or grid,min-width: autooften makes min width determined by content. It has higher priority than many other CSS attributes includingflex-shrink,width: 0andmax-width: 100%. It's recommended to setmin-width: 0. See also -
Horizontal and vertical are different in CSS:
- Normally
width: autotries fill available space in parent. Butheight: autonormally tries to just expand to fit content. - For inline elements, inline-block elements and float elements,
width: autodoes not try to expand. margin: 0 autocenters horizontally. Butmargin: auto 0normally becomemargin: 0 0which does not center vertically. In a flexbox withflex-direction: column,margin: auto 0can center vertically.- Margin collapse happens vertically but not horizontally.
- The above flips when layout direction flips (e.g.
writing-mode: vertical-rl)
- Normally
-
Block formatting context (BFC):
display: flow-rootcreates a BFC. (There are other ways to create BFC, likeoverflow: hidden,overflow: auto,overflow: scroll,display:table, but with side effects)- Margin collapse. Two vertically touching siblings can overlap margin. Child margin can "leak" outside of parent. Margin collapse can be avoided by BFC. Margin collapse also doesn't happen when
borderorpaddingspcified - If a parent only contains floating children, the parent's height will collapse to 0. Can be fixed by BFC.
-
In these cases, it will start a new stacking context:
- The attributes that give special rendering effects (
transform,filter,perspective,mask,opacityetc.) will create a new stacking context position: fixedorposition: stickycreates a stacking context- Specifies
z-indexandpositionisabsoluteorrelative - Specifies
z-indexand the element is inside flexbox or grid isolation: isolate- ...
Stacking context can cause these behaviors:
z-indexdoesn't work across stacking contexts. It only works within a stacking context.- Stacking context can affect the coordinate of
position: absoluteorfixed. (The underlying logic is complex, see also ) position: stickydoesn't work across stacking context.overflow: visiblewill still be clipped by stacking contextbackground-attachment: fixedwill position based on stacking context
- The attributes that give special rendering effects (
-
On mobile browsers, the top address bar and bottom navigation bar can go out of screen when you scroll down.
100vhcorrespond to the height when top bar and bottom bar gets out of screen, which is larger than the height when the two bars are on screen. The modern solution is100dvh. -
width: 100vwmakes the width-that-excludes-scrollbar to be100vw, which can make the total width (including scrollbar) to horizontally overflow.width: 100%can avoid that issue. -
position: absoluteis not based on its parent. It's based on its nearest positioned ancestor (the nearest ancestor that haspositionberelative,absoluteor creates stacking context). -
If the parent's
displayisflexorgrid, then the child'sfloathas no effect -
If the parent's width/height is not pre-determined, then percent width/height (e.g.
width: 50%,height: 100%) doesn't work. (It avoids circular dependency where parent height is determined by content height, but content height is determined by parent height.) -
display: inlineignoreswidthheightandmargin-topmargin-bottom -
Whitespace collapse. See also
- By default, newlines in html are treated as spaces. Multiple spaces together collapse into one.
<pre>can avoid collapsing whitespace but has weird behavior in the beginning and end of content.- Often the spaces in the beginning and end of content are ignored, but this doesn't happen in
<a>. - Any space or line break between two
display: inline-blockelements will be rendered as spacing. This doesn't happen in flexbox or grid.
-
text-alignaligns text and inline things, but doesn't align block elements (e.g. normal divs). -
By default
widthandheightdoesn't include padding and border.width: 100%withpadding: 10pxcan still overflow the parent.box-sizing: border-boxmake the width/height include border and padding. -
Cumulative Layout Shift. It's recommended to specify
widthandheightattribute in<img>to avoid layout shift due to image loading delay. -
File download request is not shown in Chrome dev tool, because it only shows networking in current tab, but file download is treated as in another tab. To inspect file download request, use
chrome://net-export/. -
JS-in-HTML may interfere with HTML parsing. For example
<script>console.log('</script>')</script>makes browser treat the first</script>as ending tag. See also
Unicode and text
- Two concepts: code point, grapheme cluster:
- Grapheme cluster is the "unit of character" in GUI.
- For visible ascii characters, a character is a code point, a character is a grapheme cluster.
- An emoji is a grapheme cluster, but it may consist of many code points.
- In UTF-8, a code point can be 1, 2, 3 or 4 bytes. The byte number does not necessarily represent code point number.
- In UTF-16, each UTF-16 code unit is 2 bytes. A code point can be 1 code unit (2 bytes) or 2 code units (4 bytes, surrogate pair).
- JSON string
\uescape uses surrogate pair."\uD83D\uDE00"in JSON has only one code point.
- Different in-memory string behaviors in different languages:
- Rust use UTF-8 for in-memory string.
s.len()gives byte count. Rust does not allow directly indexing on astr(but allows subslicing).s.chars().count()gives code point count. Rust is strict in UTF-8 code point validity (for example Rust doesn't allow subslice to cut on invalid code point boundary). - Java, C# and JS's string encoding is similar to UTF-16 1. String length is code unit count. Indexing works on code units. Each code unit is 2 bytes. One code point can be 1 code unit or 2 code units.
- In Python,
len(s)gives code point count. Indexing gives a string that contains one code point. - Golang string has no constraint of encoding and is similar to byte array. String length and indexing works same as byte array. But the most commonly used encoding is UTF-8. See also
- In C++,
std::stringhas no constraint of encoding and is similar to byte array. String length and indexing is based on bytes. - No language mentioned above do string length and indexing based on grapheme cluster.
- In SQL,
varchar(100)limits 100 code points (not byte).
- Rust use UTF-8 for in-memory string.
- Some text files have byte order mark (BOM) at the beginning. For example, FE FF means file is in big-endian UTF-16. EF BB BF means UTF-8. It's mainly used in Windows. Some non-Windows software does not handle BOM.
- When converting binary data to string, often the invalid places are replaced by � (U+FFFD)
- Confusable characters.
- Normalization. For example é can be U+00E9 (one code point) or U+0065 U+0301 (two code points). String comparision works on binary data and don't consider normalization.
- Zero-width characters, Invisible characters
- Line break. Windows often use CRLF
\r\nfor line break. Linux and MacOS often use LF\nfor line break. - Locale (elaborated below).
Floating point
-
NaN. Floating point NaN is not equal to any number including itself. NaN == NaN is always false (even if the bits are same). NaN != NaN is always true. Computing on NaN usually gives NaN (it can "contaminate" computation).
-
There are +Inf and -Inf. They are not NaN.
-
There is a negative zero -0.0 which is different to normal zero. The negative zero equals zero when using floating point comparision. Normal zero is treated as "positive zero". The two zeros behave differently in some computations (e.g.
1.0 / 0.0 == Inf,1.0 / -0.0 == -Inf,log(0.0) == -Inf,log(-0.0)is NaN) -
JSON standard doesn't allow NaN or Inf:
- JS
JSON.stringifyturns NaN and Inf to null. - Python
json.dumps(...)will directly writeNaN,Infinityinto result, which is not compliant to JSON standard.json.dumps(..., allow_nan=False)will raiseValueErrorif has NaN or Inf. - Golang
json.Marshalwill give error if has NaN or Inf.
- JS
-
Directly compare equality for floating point may fail due to precision loss. Compare equality by things like
abs(a - b) < 0.00001 -
JS use floating point for all numbers. The max "safe" integer is . The "safe" here means every integer in range can be accurately represented. Outside of the safe range, most integers will be inaccurate. For large integer it's recommended to use
BigInt.If a JSON contains an integer larger than that, and JS deserializes it using
JSON.parse, the number in result will be likely inaccurate. The workaround is to use other ways of deserializing JSON or use string for large integer.(Putting millisecond timestamp integer in JSON fine, as millisecond timestamp exceeds limit in year 287396. But nanosecond timestamp suffers from that issue.)
-
Associativity law and distribution law doesn't strictly hold because of precision loss. Parallelizing matrix multiplication and sum dynamically using these laws can be non-deterministic. See also: Defeating Nondeterminism in LLM Inference
-
Division is much slower than multiplication (unless using approximation). Dividing many numbers with one number can be optimized by firstly computing reciprocal then multiply by reciprocal.
-
These things can make different hardware have different floating point computation results:
- Hardware FMA (fused multiply-add) support.
fma(a, b, c) = a * b + c(in some placesa + b * c). Most modern hardware make intermediary result in FMA have higher precision. Some old hardware or embedded processors don't do that and treat it as normal multiply and add. - Floating point has a Subnormal range to make very-close-to-zero numbers more accurate. Most mondern hardware can handle them, but some old hardware and embedded processors treat subnormals as zero.
- Rounding mode. The standard allows different rounding modes like round-to-nearest-ties-to-even (RNTE) or round-toward-zero (RTZ).
- In X86 and ARM, rounding mode is thread-local mutable state can be set by special instructions. It's not recommended to touch the rounding mode as it can affect other code.
- In GPU, there is no mutable state for rounding mode. Rasterization often use RNTE rounding mode. In CUDA different rounding modes are associated by different instructions.
- Math functions (e.g. sin, log) may be less accurate in some embedded hardware or old hardware.
- X86 has legacy FPU which has 80-bit floating point registers and per-core rounding mode state. It's recommended to not use them.
- ......
- Hardware FMA (fused multiply-add) support.
-
Floating point accuracy is low for values with very large absolute value or values very close to zero. It's recommended to avoid temporary result to have very large absolute value or be very close-to-zero.
-
Iteration can cause error accumulation. For example, if something need to rotate 1 degree every frame, don't cache the matrix and multiply 1-degree rotation matrix every frame. Compute angle based on time then re-calculate rotation matrix from angle.
Time
- Leap second. Unix timestamp is "transparent" to leap second, which means converting between Unix timestamp and UTC time ignores leap second. A common solution is leap smear: make the time measured in Unix timestamp stretch or squeeze near a leap second.
- Time zone. UTC and Unix timestamp is globally uniform. But human-readable time is time-zone-dependent. It's recommended to store timestamp in database and convert to human-readable time in UI, instead of storing human-readable time in database.
- Daylight Saving Time (DST): In some region people adjust clock forward by one hour in warm seasons.
- Time may "go backward" due to NTP sync.
- It's recommended to configure the server's time zone as UTC. Different nodes having different time zones will cause trouble in distributed system. After changing system time zone, the database may need to be reconfigured or restarted.
- There are two clocks: hardware clock and system clock. The hardware clock itself doesn't care about time zone. Linux treats it as UTC by default. Windows treats it as local time by default.
Java
==compares object reference. Should use.equalsto compare object content.- Forget to override
equalsandhashcode. It will use object identity equality by default in map key and set. - Mutate the content of map key object (or set element object) makes the container malfunciton.
- A method that returns
List<T>may sometimes return mutableArrayList, but sometimes returnCollections.emptyList()which is immutable. Trying to modify onCollections.emptyList()throwsUnsupportedOperationException. - A method that returns
Optional<T>may returnnull(this is not recommended, but this do exist in real codebases). - Null is ambiguous. If
get()on a map returns null, it may be either value is missing or value exists but it's null (can distinguish bycontainsKey). Null field and missing field in JSON are all mapped to null in Java object. See also - Implicitly converting
IntegerLongDoubleetc. tointlongdoubleetc. can causeNullPointerException. - Return in
finallyblock swallows any exception thrown in thetryorcatchblock. The method will return the value fromfinally. - Interrupt. Some libraries ignore interrupt. If a thread is interrupted and then load a class, and class initialization has IO, then class may fail to load.
- Thread pool does not log exception of tasks sent by
.submit()by default. You can only get exception from the future returned by.submit(). Don't discard the future. AndscheduleAtFixedRatetask silently stop if exception is thrown. - Literal number starting with 0 will be treated as octal number. (
0123is 83) - When debugging, debugger will call
.toString()to local variables. Some class'.toString()has side effect, which cause the code to run differently under debugger. This can be disabled in IDE. - Before Java24 virtual thread can be "pinned" when blocking on
synchronizedlock, which may cause deadlock. It's recommended to upgrade to Java 24 if you use virtual thread. - It's not recommended to override
finalize(). Iffinalize()runs too slow, it can block GC. Exceptions out offinalize()are not logged. A dead object can resurrect itself infinalize(), and if a resurrected object become dead again,finalize()won't be called again. UseCleanerfor GC-directed disposal.
Golang
append()reuses memory region if capacity allows. Appending to a subslice can overwrite parent if they share memory region.deferexecutes when the function returns, not when the lexical scope exits.defercapture mutable variable.- About
nil:- There are nil slice and empty slice (the two are different). But there is no nil string, only empty string. The nil map can be read like an empty map, but nil map cannot be written.
- Interface
nilweird behavior. Interface pointer is a fat pointer containing type info and data pointer. If the data pointer is null but type info is not null, then it will not equalnil.
- Before Go 1.22, loop variable capture issue.
- Dead wait. Understanding Real-World Concurrency Bugs in Go
- Different kinds of timeout. The complete guide to Go net/http timeouts
- Having interior pointer to an object keeps the whole object alive. This may cause memory leak.
C/C++
- Storing a pointer of an element inside
std::vectorand then grow the vector,vectormay re-allocate content, making element pointer no longer valid. std::stringcreated from literal string may be temporary. Takingc_str()from a temporary string is wrong.- Iterator invalidation. Modifying a container when looping on it.
std::removedoesn't remove but just rearrange elements.eraseactually removes.- Literal number starting with 0 will be treated as octal number. (
0123is 83) - Destructing a deep tree structure can stack overflow. Solution is to replace recursion with loop in destructor.
- Undefined behaviors. The compiler optimization aim to keep defined behavior the same, but can freely change undefined behavior. Relying on undefined behavior can make program break under optimization. See also
- Accessing uninitialized memory is undefined behavior. Converting a
char*to struct pointer can be seen as accessing uninitialized memory, because the object lifetime hasn't started. It's recommended to put the struct elsewhere and usememcpyto initialize it. - Accessing invalid memory (e.g. null pointer) is undefined behavior.
- Integer overflow/underflow is undefined behavior. Note that unsigned integer can underflow below 0.
- Aliasing.
- Aliasing means multiple pointers point to the same place in memory.
- Strict aliasing rule: If there are two pointers with type
A*andB*, then compiler assumes two pointer can never equal. If they equal, it's undefined behavior. Except in two cases: 1.AandBhas subtyping relation 2. converting pointer to byte pointer (char*,unsigned char*orstd::byte*) (the reverse does not apply). - Pointer provenance. Two pointers from two different provenances are treated as never alias. If their address equals, it's undefined behavior. See also
- Accessing uninitialized memory is undefined behavior. Converting a
- Alignment.
- For example, 64-bit integer's address need to be disivible by 8. In ARM, accessing memory in unaligned way can cause crash.
- Unaligned memory access is undefined behavior.
- Directly treating a part of byte buffer as a struct is undefined behavior. Not only due to alignment, but also due to object lifetime not yet started 2.
- Alignment can cause padding in struct that waste space.
- Some SIMD instructions only work with aligned data. For example, AVX instructions usually require 32-byte alignment.
Python
- Default argument is a stored value that will not be re-created on every call.
- Be careful about indentation when copying and pasting Python code.
SQL Databases
- Null is special.
x = nulldoesn't work.x is nullworks. Null does not equal itself, similar to NaN.- Unique index allows duplicating null (except in Microsoft SQL server).
select distinctmay treat nulls as the same (this is database-specific).count(x)andcount(distinct x)ignore rows wherexis null.
- Date implicit conversion can be timezone-dependent.
- Complex join with disctinct may be slower than nested query. See also
- In MySQL (InnoDB), if string field doesn't have
character set utf8mb4then it will error if you try to insert a text containing 4-byte UTF-8 code point. - MySQL (InnoDB) default to case-insensitive.
- MySQL (InnoDB) can do implicit conversion by default.
select '123abc' + 1;gives 124. - MySQL (InnoDB) gap lock may cause deadlock.
- In MySQL (InnoDB) you can select a field and group by another field. It gives nondeterministic result.
- In SQLite the field type doesn't matter unless the table is
strict. - SQLite by default does not do vacuum. The file size only increases and won't shrink. To make it shrink you need to either manually
vacuum;or enableauto_vacuum. - Foreign key may cause implicit locking, which may cause deadlock.
- Locking may break repeatable read isolation (it's database-specific).
- Distributed SQL database may doesn't support locking or have weird locking behaviors. It's database-specific.
- If the backend has N+1 query issue, the slowness may won't be shown in slow query log, because the backend does many small queries serially and each individual query is fast.
- Long-running transaction can cause problems (e.g. locking). It's recommended to make all transactions finish quickly.
- If a string column is used in index or primary key, it will have length limit. MySQL applies the limitation when changing table schema. PostgreSQL applies the limitation by erroring when inserting or updating data.
- Whole-table locks that can make the service temporarily unusable:
- In MySQL (InnoDB) 8.0+, adding unique index or foreign key is mostly concurrent (only briefly lock) and won't block operations. But in older versions it may do whole-table lock.
mysqldumpused without--single-transactioncause whole-table read lock.- In PostgreSQL,
create unique indexoralter table ... add foreign keycause whole-table read-lock. To avoid that, usecreate unique index concurrentlyto add unique index. For foreign key, usealter table ... add foreign key ... not valid;thenalter table ... validate constraint ....
- About ranges:
- If you store non-overlapping ranges, querying the range containing a point by
select ... from ranges where p >= start and p <= endis inefficient (even when having composite index of(start, end)). An efficient way:select * from (select ... from ranges where start <= p order by start desc limit 1) where end >= p(only require index ofstartcolumn). - For overlappable ranges, normal B-tree index is not sufficient for efficient querying. It's recommended to use spatial index in MySQL and GiST in PostgreSQL.
- If you store non-overlapping ranges, querying the range containing a point by
Concurrency and Parallelism
volatile:volatileitself cannot replace locks.volatileitself doesn't provide atomicity.- You don't need
volatilefor data protected by lock. Locking can already establish memory order and prevent some wrong optimizations. - In C/C++,
volatileonly avoids some wrong optimizations, and won't automatically add memory barrier instruction forvolatileaccess. - In Java,
volatileaccesses have sequentially-consistent ordering (JVM will use memory barrier instruction if needed) - In C#, accesses to the same
volatilevalue have release-aquire ordering (CLR will use memory barrier instruction if needed) volatilecan avoid wrong optimization related to reordering and merging memory reads/writes. (Compiler can merge reads by caching a value in register. Compiler can merge writes by only writing to register and delaying writing to memory. A read after a write can be optimized out.).
- Time-of-check to time-of-use (TOCTOU).
- In SQL database, for special uniqueness constraints that doesn't fit simple unique index (e.g. unique across two tables, conditional unique, unique within time range), if the constraint is enforced by application, then:
- In MySQL (InnoDB), if in repeatable read level, application checks using
select ... for updatethen insert, and the unique-checked column has index, then it works due to gap lock. (Note that gap lock may cause deadlock under high concurrency, ensure deadlock detection is on and use retrying). - In PostgreSQL, if in repeatable read level, application checks using
select ... for updatethen insert, it's not sufficient to enforce constraint under concurrency (due to write skew). Some solutions:- Use serializable level
- Don't rely on application to enforce constraint:
- For conditional unique, use partial unique index.
- For uniqueness across two tables case, insert redundant data into one extra table with unique index.
- For time range exclusiveness case, use range type and exclude constraint.
- In MySQL (InnoDB), if in repeatable read level, application checks using
- Atomic reference counting (
Arc,shared_ptr) can be slow when many threads frequently change the same counter. See also - About read-write lock: trying to write lock when holding read lock can deadlock. The correct way is to firstly release the read lock, then acquire write lock, and the conditions that were checked in read lock need to be re-checked.
- Reentrant lock:
- Reentrant means one thread can lock twice (and unlock twice) without deadlocking. Java's
synchronizedandReentrantLockare reentrant. - Non-reentrant means if one thread lock twice, it will deadlock. Rust
Mutexand Golangsync.Mutexare not reentrant.
- Reentrant means one thread can lock twice (and unlock twice) without deadlocking. Java's
- False sharing of the same cache line costs performance.
Common in many languages
- Forget to check for null/None/nil.
- Modifying a container when for looping on it. Single-thread "data race".
- Unintended sharing of mutable data. For example in Python
[[0] * 10] * 10does not create a proper 2D array. - For non-negative integer
(low + high) / 2may overflow. A safer way islow + (high - low) / 2. - Short circuit.
a() || b()will not runb()ifa()returns true.a() && b()will not runb()whena()returns false. - When using profiler: the profiler may by default only include CPU time which excludes waiting time. If your app spends 90% time waiting on database, the flamegraph may not include that 90% which is misleading.
Linux and bash
- If the current directory is moved,
pwdstill shows the original path.pwd -Pshows the real path. cmd > file 2>&1make both stdout and stderr go to file. Butcmd 2>&1 > fileonly make stdout go to file but don't redirect stderr.- File name is case sensitive (unlike Windows).
- There is a capability system for executables, apart from file permission sytem. Use
getcapto see capability. - Unset variables. If
DIRis unset,rm -rf $DIR/becomesrm -rf /. Usingset -ucan make bash error when encountering unset variable. - If you want a script to add variables and aliases to current shell, it should be executed by using
source script.sh, instead of directly executing. But the effect ofsourceis not permanent and doesn't apply after re-login. It can be made permanent by putting into~/.bashrc. - Bash has caching between command name and file path of command. If you move one file in
$PATHthen using that command gives ENOENT. Refresh cache usinghash -r - Using a variable unquoted will make its line breaks treated as space.
set -ecan make the script exit immediately when a sub-command fails, but it doesn't work inside function whose result is condition-checked (e.g. the left side of||,&&, condition ofif). See also- K8s
livenessProbeused with debugger. Breakpoint debugger usually block the whole application, making it unable to respond health check request, so it can be killed by K8slivenessProbe.
React
- Modify state in rendering code.
- React compares equality using reference equality, not content equality.
- The objects and arrays that are newly created in rendering are treated as always-new. Use
useMemoto fix. - The closure functions that are created in rendering are also always-new. Use
useCallbackto fix. - If an always-new thing is put into
useEffectdependency array, the effect will run on every render. See also Cloudflare indicent 2025 Sept-12. - Don't forget to include dependencies in the dependency array. And the dependencies also need to be memoed.
- The objects and arrays that are newly created in rendering are treated as always-new. Use
- When using effect to manage
setIntervalremoveInterval, if the effect has dependency value, it will remove timer and re-add timer when dependency changes, which can mess up the timing. - State objects themselves should be immutable. Don't directly set fields of state objects. Always recreate whole object.
- Forget to include value in
useEffectdependency array. - Forget clean up in
useEffect. - Closure trap. Closure can capture a state. If the state changes, the closure still captures the old state.
- One solution is to make closure not capture state and access state within
useReducer. - Another solution is to put state in
useRef(note that changing value inside ref don't trigger re-rendering, you need to change state or prop to trigger re-rendering)
- One solution is to make closure not capture state and access state within
useEffectfirstly runs after component DOM presents in web page. Doing initialization inuseEffectmay cause visual flicker. UseuseLayoutEffectfor early initialization.- When using ref to get DOM object, it won't be accessible during first rendering (component function call). It can be accessed in
useLayoutEffect.
Git
- Rebase can rewrite history. After rebasing local branch, normal push will give weird result (because history is rewritten). Rebase should be used with force push. If remote branch's history is rewritten, pulling should use
--rebase.- Force pushing with
--force-with-leasecan sometimes avoid overwriting other developers' commits. But if you fetch then don't pull,--force-with-leasecannot protect.
- Force pushing with
- Reverting a merge doesn't fully cancel the side effect of the merge. If you merge B to A and then revert, merging B to A again has no effect. One solution is to revert the revert of merge. (A cleaner way to cancel a merge, instead of reverting merge, is to backup the branch, then hard reset to commit before merge, then cherry pick commits after merge, then force push.)
- In GitHub, if you accidentally commited secret (e.g. API key) and pushed to public, even if you override it using force push, GitHub will still record that secret. See also Example activity tab
- In GitHub, if there is a private repo A and you forked it as B (also private), then when A become public, the private repo B's content is also publicly accessible, even after deleting B. See also.
- GitHub by default allows deleting a release tag, and adding a new tag with same name, pointing to another commit. It's not recommended to do that. Many build systems cache based on release tag, which breaks under that. It can be disabled in rulesets configuration.
git stash popdoes not drop the stash if there is a conflict.- In Windows, Git often auto-convert cloned text files to be CRLF line ending. But in WSL many software (e.g. bash) doesn't work with files with CRLF. Using
git clone --config core.autocrlf=false -c core.eol=lf ...can make git clone as LF. - MacOS auto adds
.DS_Storefiles into every folder. It's recommended to add**/.DS_Storeinto.gitignore.
Networking
-
Some routers and firewall silently kill idle TCP connections without telling application. Some code (like HTTP client libraries, database clients) keep a pool of TCP connections for reuse, which can be silently invalidated (using these TCP connection will get RST). To solve it, configure system TCP keepalive. See also
- Note that HTTP/1.0 Keep-Alive is different to TCP keepalive.
-
The result of
tracerouteis not reliable. See also. Sometimes tcptraceroute is useful. -
TCP slow start can increase latency. Can be fixed by disabling
tcp_slow_start_after_idle. See also -
TCP sticky packet. Nagle's algorithm delays packet sending. It will increase latency. Can be fixed by enabling
TCP_NODELAY. See also -
If you put your backend behind Nginx, you need to configure connection reuse, otherwise under high concurrency, connection between nginx and backend may fail, due to not having enough internal ports.
-
Nginx
proxy_bufferingdelays SSE. -
The HTTP protocol does not explicitly forbit GET and DELETE requests to have body. Some places do use body in GET and DELETE requests. But many libraries and HTTP servers does not support them.
-
One IP can host multiple websites, distinguished by domain name. The HTTP header
Hostand SNI in TLS handshake carries domain name, which are important. Some websites cannot be accessed via IP address. -
CORS (cross-origin resource sharing). For requests to another website (origin), the browser will prevent JS from getting response, unless the server's response contains header
Access-Control-Allow-Originand it matches client website. This requires configuring the backend. If you want to pass cookie to another website it involves more configuration.Generally, if your frontend and backend are in the same website (same domain name and port) then there is no CORS issue.
-
Reverse path filtering. When routing is asymmetric, packet from A to B use different interface than packets from B to A, then reverse path filtering rejects valid packets.
-
In old versions of Linux, if
tcp_tw_recycleis enabled, it aggressively recycles connection based on TCP timestamp. NAT and load balancer can make TCP timestamp not monotonic, so that feature can drop normal connections.
Locale
- The upper case and lower case can be different in other natural languages. In Turkish (tr-TR) lowercase of
Iisıand upper case ofiisİ. The\w(word char) in regular expression can be locale-dependent. - Letter ordering can be different in other natural languages. Regular expression
[a-z]may malfunction in other locale. - Text notation of floating-point number is locale-dependent.
1,234.56in US correspond to1.234,56in Germany. - CSV use normally use
,as spearator, but use;as separator in German locale. - Han unification. Some characters in different language with slightly different appearance use the same code point. Usually a font will contain variants for different languages that render these characters differently. HTML code

Regular expression
- Regular expression cannot parse the syntax that allows infinite nesting (because regular expression engine use finite state machine. Infinite nesting require infinite states to parse). HTML allows infinite nesting. But it's ok to use regex to parse HTML of a specific website.
- Regular expression behavior can be locale-dependent (depending on which regular expression engine).
- There are many different "dialects" of regular expression. Don't assume a regular expression that works in JS can work in Java.
- A separate regular expression validation can be out-of-sync with actual data format. Crowdstrike incident was caused by a wrong separate regular expression validation. It's recommended to avoid separate regular expression validation. Reuse parsing code for validation. See also: Parse, don't validate
- Backtracking performance issue. See also: Cloudflare indicent 2019 July-2, Stack Exchange incident 2016 July-20
Other
- YAML:
- YAML is space-sensitive, unlike JSON.
key:valueis wrong.key: valueis correct. - YAML doesn't allow using tab for indentation.
- Norway country code
NObecome false if unquoted. - Git commit hash may become number if unquoted.
- The yaml document from hell
- YAML is space-sensitive, unlike JSON.
- When using Microsoft Excel to open a CSV file, Excel will do a lot of conversions, such as date conversion (e.g. turn
1/2and1-2into2-Jan) and Excel won't show you the original string. The gene SEPT1 was renamed due to this Excel issue. Excel will also make large numbers inaccurate (e.g. turn12345678901234567890into12345678901234500000) and won't show you the original accurate number, because Excel internally use floating point for number. - It's recommended to configure billing limit when using cloud services, especially serverless. See also: ServerlessHorrors
- Big endian and little endian in binary file and net packet.
Footnotes
-
Strictly speaking, they use WTF-16 encoding, which is similar to UTF-16 but allows invalid surrogate pairs. Also, Java has an optimization that use Latin-1 encoding (1 byte per code point) for in-memory string if possible. But the API of
Stringstill works on WTF-16 code units. Similar things may happen in C# and JS. ↩ -
Directly treating existing binary data as struct is undefined behavior because the object lifetime hasn't started. But using
memcpyto initialize a struct is fine. ↩